Method and device for detecting double-talk, and echo canceler

ABSTRACT

This invention relates to a double talk detecting method suited to be used for an echo canceler, and more particularly to a method for correctly detecting whether or not the double talk is present, even in the case where a speech transmitting level difference between a far-end talker and a near-end talker is comparatively large. What is assumed in the present invention is an echo canceler which can be employed in a telephone line network for transmitting a speech of a far-end talker sent through a digital line and a speech of a near-end talker sent through an analog line. A double talk detecting method applicable to the echo canceler here extracts predetermined analyzed parameters respectively from a first speech signal corresponding to the speech of the far-end talker sent from the echo canceler to a hybrid circuit and the speech of the near-end talker input to the echo canceler through the hybrid circuit. The analyzed parameters herein used include, for example, speech pitches, frequency components, etc., of each of the speech signals. In this invention, the analyzed parameters themselves or correlation values of the fluctuations of the analyzed parameters, etc. are calculated and a detection is made as to whether or not the double talk is present, based on the result of calculation.

TECHNICAL FIELD

This invention relates to a double talk detecting method which iscarried out in a mobile communications network and in a long-distancetelephone line network when echo is canceled, a double talk detectingapparatus and an echo canceler, which are suited to be used for carryingout the afore-mentioned method.

BACKGROUND ART

In a long-distance telephone line via a submarine cable or via acommunication satellite, the subscriber's line, in general, connected toboth ends of the line is of a two-wire circuit and its long-distancetransmission portion is of a four-wire circuit employed foramplification of signal and some other purposes. Similarly, in themobile communications network using a mobile telephone (or cellularphone), the subscriber's line of a terrestrial analog telephone is of atwo-wire circuit and its portion from a terminal of the mobile telephoneto a switch, etc. is of a four-wire circuit. In this case, theconnection region between the two-wire and the four-wire is providedwith a hybrid circuit for performing a four-wire/two-wire conversion.This hybrid circuit is designed to match with the impedance of thetwo-wire circuit. However, since it is difficult to obtain always a goodmatching condition, a received signal reaching an input side of thefour-wire of the hybrid circuit tends to leak toward an output side ofthe four-wire, thereby generating a so-called echo. Since such an echoreaches the talker at a lower sound level than the talker's voice andafter a delay of a predetermined time period, a speech hindrance iscreated. Such a speech hindrance caused by echo becomes significant asthe signal propagation time becomes longer. Particularly, in the case ofa mobile communication with the aid of a mobile telephone, since variousprocessing procedures are carried out in the radio communication sectionleading to the switch, etc., the delay of signal is increased, thusresulting, particularly, in the problem of speech hindrance caused byecho.

As an apparatus for preventing a generation of echo, there are an echosuppressor and an echo canceler. FIG. 1 shows a schematic constructionof an echo canceler which can be used in a mobile communicationsnetwork. The echo canceler 1 illustrated here is located on a frontstage of a hybrid circuit 2. In this illustration, the subscriber of ananalog telephone is referred to as the “near-end talker” and thesubscriber of a mobile telephone as the “far-end talker”. A far-endspeech signal input into the echo canceler 1 is represented by Rin; afar-end speech signal output from the echo canceler 1, by Rout; anear-end speech signal input into the echo canceler 1, by Sin; and anear-end speech signal output from the echo canceler 1; by Sout,respectively.

The echo canceler 1 shown in FIG. 1 comprises an echo path estimationcircuit/echo replica generator 3, a control unit 4, an adder 5, and anon-linear processor 6. Here, the echo path estimation circuit/echoreplica generator 3 detects a response characteristic of the hybridcircuit 2 based on both the far-end speech input Rin and near-end speechinput Sin and estimates an echo path (namely, echo propagating line).Then, an anticipated echo (namely, echo replica) from the hybrid circuit2 is generated through a convolutional operation as a result ofestimation and the far-end speech input Rin. In the adder 5, this echoreplica is subtracted from the near-end speech input Sin, therebycanceling the echo. As the above-mentioned echo path estimationalgorithm, a learning identification algorithm is used. Among manyadaptive algorithms, this learning identification algorithm iscomparatively small in computational complexity and good in convergencecharacteristic.

As shown in FIG. 1, the echo path estimation circuit/echo replicagenerator 3 includes an echo path estimation circuit 3 a, an H-register3 b, and an echo replica generator 3 c. In this case, the echo pathestimation circuit 3 a estimates an echo path using the learningidentification algorithm which is, among many other adaptive algorithms,generally comparatively small in computational complexity and good inconvergence characteristic, and writes a tap coefficient (as laterdescribed) corresponding to the estimated echo path in the H-register 3b. The echo replica generator 3 c comprises an FIR adaptive digitalfilter. The generator 3 c generates an echo replica using the tapcoefficient in the H-register 3 b and through a convolutional operationwith the far-end speech input Rin. The learning identification algorithmis a known estimation algorithm as disclosed, for example, in Instituteof Electronics and Communication Engineers of Japan (IECE) Journal'77/11 Vol. J60-A NO.11, article under the heading of “Regarding EchoCanceling Characteristic of Echo Canceler Using Learning IdentificationAlgorithm” (written by: Itakura and Nishikawa). The outline of thelearning identification algorithm discussed in this article will bebriefly described.

Firstly, if the impulse response h(t) and input signal x(t) are usedpresuming that the signal propagation characteristic of the echo path islinear, the echo y_(k) at the time kT (where T is a sampling interval)can be expressed as follows.

y _(k) =h ^(t) x _(k)  (1)

where:

h=(h ₁ , h ₂ , . . . , h _(n)), h _(j) =h(_(j) T)

x _(k)=(x _(k−1) , x _(k−2) , . . . , x _(k−n))^(t) , x _(j) =x(_(j)T)  (2)

(where ^(t) is transposition of vector)

On the other hand, if the estimation value of h at the time kT isrepresented by H_(k) (hereinafter referred to as the “tap coefficient”),an estimation value Y_(k) of y_(k) can be given by the followingexpression.

Y _(k) =H _(k) ^(t) x _(k)  (3)

Then, a successive correction of H_(k) according to the learningidentification algorithm is made by $\begin{matrix}{H_{k + 1} = \left\{ \begin{matrix}{H_{k} + {{\alpha \left( {x_{k}e_{k}} \right)}/\left( {x_{k}x_{h}} \right)}} & \left( {{x^{t}x_{k}} \geq {ne}^{2}} \right) \\H_{k} & \left( {{x^{t}x_{k}} < {ne}^{2}} \right)\end{matrix} \right.} & (4)\end{matrix}$

where:

e _(k) =y _(k) −Y _(k)  (5)

Namely, e_(k) is a residual echo. This residual echo appears on theoutput side of the adder 5. As apparent from the above expression (5),the next tap coefficient H_(k+1) is calculated so that the residual echowill be reduced. Through calculation in the digital circuit, theabove-mentioned algorithm can be specifically expressed as listed below.Firstly, the far-end speech input Rin, which is taken into the echo pathestimation circuit 3 a, is handled as a digital signal Xt (where t is asampling time) having N pieces of sample values.

X _(t)=(x(t), x(t−1), . . . , . . . x(t−(N−1))  (6)

If the tap coefficient H_(t) at the time t written in the H-register 3 bcan be expressed as follows,

H _(t)=(h _(t)(0), h _(t)(t), . . . , h _(t)(N−1))  (7)

the convolutional operation in the echo replica generator 3 c (FIRfilter) can be expressed as follows. $\begin{matrix}{{Y(t)} = {\sum\limits_{i = 0}^{N - 1}{\left( {t - 1} \right) \times {h_{t}(i)}}}} & (8)\end{matrix}$

If the inner product of the vector is represented by “*” here, the aboveexpression (8) can be rewritten as follows.

Y(t)=x _(t) *H _(t)  (9)

Now, if the residual echo obtained on the output side of the adder 5 isrepresented by er(t), the following expression can be obtained.

er(t)=e(t)−Y(t)  (10)

From the expressions so far listed, a fluctuation ΔH_(t) of H_(t) can beexpressed as follows.

ΔH _(t) =g×er(t)×x _(t)/(x _(t) *X _(t))  (11)

H_(t+1) can be expressed as follows.

H _(t+1) =H _(t) +ΔH _(t)  (12)

Therefore, the echo path estimation circuit 3 a reads the tapcoefficient H in the H-register 3 b. By adding ΔH_(t), which iscalculated in the expression (11), to the tap coefficient H thus read,the echo path estimation circuit 3 a, in turn, calculates the next tapcoefficient H_(t+1) and writes it in the H-register 3 b. In this way,the tap coefficients H in the H-register 3 b are gradually renewed. Whathas been described so far is a specific computation in the digitalcircuit according to the learning identification algorithm. Also, theabove expressions (6) to (12) are disclosed in Japanese Patent Laid-OpenApplication No. Hei 5-129989 and some others.

As conditions for enabling the above learning, the followingrequirements must be met.

{circle around (1)} A far-end speech output Rout of the level sufficientfor an echo to come back as a near-end speech input Sin is present. Inother words, the far-end taker is currently engaged in speech.

{circle around (2)} The near-end speech input Sin is constituted merelyof an echo (or an echo and a white noise). In other words, the near-endtaker is not engaged in speech.

On the other hand, when the far-end talker is in a speechless conditionand when the far-end talker and the near-end talker are simultaneouslyengaged in speech (this state is hereinafter referred to as the “doubletalk”), it is necessary to turn off the learning function because thereis a fear to cause a mis-learning state of echo path estimation.

In the transmission line, digital signals are transmitted, and a D/Aconversion (in a general expression, a μ-LAW conversion) is made betweenthe echo canceler 1 adapted to process such digital signals and thehybrid circuit 2 adapted to undertake a conversion to the analog line.For this reason, a non-linear characteristic relation is establishedbetween the far-end speech output Rout and the near-end speech inputSin. Therefore, echo cannot be canceled completely only through thelinear computation by means of the echo path estimation circuit/echoreplica generator 3, etc. As a consequence, an echo component unable tobe completely canceled is produced. In order to remove such an echocomponent (hereinafter referred to as the “residual echo”), thenon-linear processor 6 is employed. This non-linear processor 6undertakes a non-linear switching operation. Specifically, in case thenear-end speech output Sout is constituted merely of an echo, in otherwords, in case only the far-end talker is currently engaged in speech(this state is hereinafter referred to as the “far-end talker's singletalk”), a switching operation is made such that the transmission of thenear-end speech output Sout is prohibited or an operation is made suchthat the near-end speech output Sout is replaced by a pseudo noise.

The control unit 4 controls the echo path estimation circuit/echoreplica generator 3 and the non-linear processor 6. That is, the controlunit 4 detects the far-end taker's speechless condition or detects thedouble talk, controls the ON/OFF state of the learning function of theecho path estimation, detects the far-end talker's single talk, andcontrols the switching operation of the non-linear processor 6.

As a method for detecting the double talk carried out in the controlunit 4, a power ratio of the far-end speech output Rout to the near-endspeech input Sin is heretofore used, and if this ratio exceeds anexpected echo level (for example, the maximum echo level 6 dB specifiedby CCITT standards), it is judged that the double talk has occurred.However, this conventional double talk detecting method has such aproblem in that the detection is delayed. That is, in case there is nosufficient level difference at the beginning of generation of the doubletalk, the double talk is not detected and only when the level differenceexceeds a predetermined value, the double talk is detected. As aconsequence, a detection of the double talk is not performed at a goodtiming. Also, in the case where the speech levels of the far-end takerand near-end talker are greatly different, the double talk cannot bedetected effectively.

Namely, in the case where the power for transmitting the far-endtalker's speech is larger than the power for transmitting the near-endtalker's speech, the ratio of the generated echo power to thetransmitting power of the near-end talker's speech becomes small. Insuch a case, the difference between the power for transmitting echo andthe power for transmitting the near-end talker's speech is reduced andtherefore, it becomes difficult to smoothly distinguish the echo fromthe near-end talker's speech. As a consequence, it becomes difficult todetect the double talk accurately.

The low accuracy of the double talk detection causes a fear ofmis-learning of the echo path estimation. Such a mis-learning not onlydeteriorates the function of echo cancellation but also to generate awrong echo replica, thereby sending noises to the far-end taker, etc.

DISCLOSURE OF INVENTION

The present invention has been accomplished in view of such a backgroundas mentioned above. It is, therefore, a first object of the invention toprovide a double talk detecting method in which features specified by afar-end speech signal and a near-end speech signal are extracted todetect the double talk correctly. A second object of the invention is toprovide a double talk detecting apparatus and an echo canceler, whichare suited to be used for carrying out the afore-mentioned method.

In order to achieve the above objects, according to the presentinvention, there is provided a double talk detecting method applicableto an echo canceler which is employed in a telephone line network forperforming a transmission between a speech coming through a four-wirecircuit and another speech coming through a two-wire circuit,comprising:

a first step of extracting a first feature from a first speech signalcorresponding to a speech on the four-wire circuit side, the firstfeature being specified by a waveform of the first speech signal;

a second step of extracting a second feature from a second speech signalcorresponding to a speech on the two-wire circuit side, the secondfeature being specified by a waveform of the second speech signal; and

a third step of comparing the first feature with the second feature andjudging whether or not a double talk is present, based on a result ofthe comparison.

What is most easily extracted as the first and second features hereincludes various kinds of analyzed parameters such as pitches, formantsand band widths of those speech signals. At that time, the speechsignals may be divided into a plurality of frequency bands in order tocompare the first and second speech signals in each frequency band.Further, what is meant by the expression “the features specified by thewaveforms of the speech signals” is, in some cases, a mere samplingresult of those speech signals. Let us presume here, for example, thatthe double talk detecting method is applied to an echo canceler and thelearning state in the echo canceler is fluctuated depending on thesampling result of speech signals. If the learning state in the echocanceler is fluctuated when the double talk occurs, a judgment can bemade as to whether or not the double talk is present, with reference tothe learning state itself.

According to the teaching of the present invention, since it is judgedwhether or not the double talk is present, based on the above-mentionedfeatures, a correct judgment can be made as to whether or not the doubletalk is present even in the case where the difference of speechtransmitting levels between the far-end talker and the near-end talkeris great.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram showing a construction of a conventional echocanceler.

FIG. 2 is a block diagram showing a schematic construction of anapparatus embodying a double talk detecting method according to a firstembodiment of the present invention.

FIG. 3 is a block diagram showing a schematic construction of anapparatus embodying a double talk detecting method according to a secondembodiment of the present invention.

FIG. 4 is a block diagram showing a schematic construction of anapparatus embodying a double talk detecting method according to a thirdembodiment of the present invention.

FIG. 5 is a block diagram showing a construction of an important portionof the apparatus shown in FIG. 4.

FIG. 6 is a graph showing a general impulse response of an echo path.

FIG. 7 is a graph showing an impulse response of an echo path when adelay is involved.

FIG. 8 is a block diagram showing a schematic construction of anapparatus embodying a double talk detecting method according to a fourthembodiment of the present invention.

FIG. 9 is a block diagram showing one example of a detailed constructionof a comparator used in the fourth embodiment.

BEST MODE FOR CARRYING OUT THE INVENTION

(First Embodiment)

FIG. 2 is a block diagram showing a schematic construction of anapparatus embodying a double talk detecting method according to thepresent invention. This apparatus is located in the afore-mentionedcontrol unit 4. In this illustration, reference numeral 11 denotes ananalyzed parameter extracting/analyzing unit. In this embodiment, speechpitches are employed as analyzed parameters which are used for detectingthe double talk. The speech pitches indicate a vibration cycle of humanvocal cords. For example, in a case of an active speech, the vocal cordsare vibrated and a predetermined basic cycle (or basic frequency)appears in the vibration. The analyzed parameter extracting/analyzingunit 11 extracts such speech pitches. In order to do that, some actionsare required to do; for example, discrimination of an active speech froma speechless condition, detection of the basic cycle in the case ofactive speech, and the like. Since such techniques are already proposedin various manners and known, detailed description thereof is omittedhere.

The analyzed parameter extracting/analyzing unit 11 extracts analyzedparameters Pr(i) and Ps(i) (where, i shows the time and i=0, 1, . . . )respectively from the far-end speech output Rout and from the near-endspeech input Sin. In this case, the input speech signals aremeasured/accumulated for a certain time (for example, about 10 ms) andthereafter, the analyzed parameters are extracted. That is, it takes for10 ms until the first analyzed parameter is extracted but thereafter theanalyzed parameters are extracted at intervals corresponding to samplingpitches.

The above-mentioned analyzed parameters Pr(i) and Ps(i) refer to valuesat each time. Then, the comparator unit 12 compares the analyzedparameters Pr(i) and Ps(i) at each input time. If the analyzedparameters Pr(i) and Ps(i) are equal to each other or if both of themare determined as equivalent to each other, it can be determined thatonly an echo caused by the speech of a far-end talker is generated. Incontrast, if the parameters Pr(i) and Ps(i) are different from eachother, it is judged that a state of both the speeches of the far-endtalker and near-end talker being simultaneously transmitted, i.e.,double talk, is generated. In such a case, the comparator unit 12 sendsa signal DT indicative of a generation of the double talk to the echopath estimation circuit/echo replica generator 3. As a consequence, thelearning function of estimation of an echo path is turned off.

The comparison of the analyzed parameters will be described in moredetail. For example, in the case where only the far-end talker isengaged in speech and an echo is present, Rout corresponds to the speechof the far-end talker whereas Sin corresponds its echo. In this case,the pitches of the speech of the far-end talker and its echo becomegenerally equal to each other at a predetermined time. Since a delay ofseveral tens ms (for example, about 60 ms) occurs between an input andan output, i.e., between the signals Rout and Sin, of the hybrid circuit2, if the analyzed parameters of the two are compared exactly at thesame time, they are, indeed, slightly different. However, if theanalyzed parameters of the two are compared within a range of apredetermined time, taking into consideration such a delay, they arealmost the same. In this case, namely, in the case where an echo ispresent, the learning function of estimation of an echo path works.

On the other hand, in the case where both the far-end talker andnear-end talker are engaged in speech, namely, in the case where thedouble talk is present, Rout corresponds to the speech of the far-endtalker whereas Sin corresponds to the speech of the near-end talker. Inthis case, the speech pitches of the two are necessarily different. Thereason is that the content of the speech of the far-end talker isdifferent from that of the near-end talker. Even if both the far-end andnear-end talkers' speeches are exactly the same, speech pitches thereofare necessarily different because the characteristics of the vocal cordsof them are different and therefore, they are necessarily different inspeech pitches. In such a case, it can be judged that the double talk ispresent.

Heretofore, since a judgment is made as to whether or not the doubletalk is present, by comparing the levels, accuracy of the double talkdetection depends on the difference in speech transmitting levelsbetween the far-end talker and the near-end talker. In this embodiment,however, since a way of comparison, which does not depend on levels suchas speech pitches and which is based on analyzed parameters, isemployed, the double talk can be detected correctly.

(Second Embodiment)

FIG. 3 is a block diagram showing a schematic construction of anapparatus embodying a double talk detecting method according to thesecond embodiment of the present invention. This apparatus is located inthe afore-mentioned control unit 4. In this illustration, the analyzedparameter extracting/analyzing unit 11 is constructed in the same manneras the first embodiment.

A correlation value calculator 13 calculates a time correlation value kjof the analyzed parameters Pr(i) and Ps(i) input therein, in accordancewith the following expression (13). $\begin{matrix}{{kj} = {{1/\sqrt{G_{R}G_{S\quad}}} \cdot {\sum\limits_{i = {j - N_{s}}}^{j}{\Pr \quad \left( {i - N_{D}} \right){P_{S}(i)}}}}} & (13)\end{matrix}$

where N_(D) represents a sample value of the echo delay and N_(S)represents a section in which the above-mentioned correlationcalculation is made. G_(R) and G_(S) are expressed by the under-listedexpressions (14) and (15), respectively. The above-mentioned correlationvalue kj represents the degree of correlation of fluctuations of theanalyzed parameters Pr(i) and Ps(i) in a certain section (namely,i(=j−N_(S))˜j). $\begin{matrix}{G_{R} = {\sum\limits_{i = {j - N_{S}}}^{i}{\Pr^{2}\left( {i - N_{D}} \right)}}} & (14) \\{G_{S} = {\sum\limits_{i = {j - N_{S}}}^{j}{{Ps}^{2}(i)}}} & (15)\end{matrix}$

Then, the double talk detection unit 13 compares the correlation valuekj thus calculated with a predetermined threshold K_(TH) anddetects/judges whether or not the double talk is present, based on theresult of comparison. In this case, if the conditions of theunder-listed expression (16) are satisfied, i.e., if the correlationvalue kj becomes equal to or smaller than the threshold K_(TH), it isjudged that the double talk has occurred.

kj≦K_(TH)  (16)

In the above expression, the threshold is set within a range of0≦K_(TH)<1. It should be noted that this threshold is set to an optimumvalue through various experiments, actual measurements, or the like.

The threshold kj shows the degree of correlation of fluctuations of theanalyzed parameters of both the far-end speech output Rout and near-endspeech input Sin. For example, in the case where only the far-end talkeris engaged in speech and an echo is present, Rout corresponds to thespeech of the far-end talker whereas Sin corresponds its echo. In thiscase, the degree of correlation between both of them is comparativelyhigh. During that time, namely, during the presence of an echo, theafore-mentioned learning function of estimation of an echo path works.However, in the case where both the far-end talker and near-end talkerare engaged in speech, i.e., in the case where the double talk ispresent, Rout corresponds to the speech of the far-end talker whereasSin corresponds to the speech of the near-end talker. In this case, thespeech pitches of the far-end talker and near-end talker are differentand therefore, the degree of correlation of the analyzed parametersbetween them is lowered. Therefore, when this degree of correlation islowered below a predetermined value (namely, threshold), it can bejudged that the double talk has occurred. This embodiment is made basedon such principles as mentioned.

(Third Embodiment)

A: Construction of the Embodiment

FIG. 4 is a block diagram showing a construction of one embodiment ofthe present invention. A control unit 104 in this embodiment performsmerely the controlling of the non-linear processor 6, different from thecontrol unit 4 shown in FIG. 5. Specifically, it detects the far-endtalker's single talk and performs merely the controlling of a switchingoperation of the non-linear processor 6. The double talk detection inthis embodiment is performed within an echo path estimation circuit 103a.

FIG. 5 is a block diagram showing a construction of an important portionof the echo path estimation circuit 103 a. In the illustration,reference numeral 10 denotes a computation unit for calculating afluctuation ΔH of a tap coefficient H in accordance with the learningidentification algorithm, and reference numeral 12 denotes an adder.

Similarly, reference numeral 25 denotes a processing register which isconstructed in the same manner as the H-register 3 b. Output signalsfrom the processing register 25 are supplied to a storage register 26-1and the adder 12, respectively. A plurality of such storage registersare provided as represented by 26-1 to 26-n. The construction of eachregister is the same as the H-register 3 b. The tap coefficient H isgradually transmitted from the storage register 26-1 toward the storageregister 26-n. An output data from the storage register 26-n is suppliedto the H-register 3 b.

In general, response characteristic of an impulse of the echo path is asshown in FIG. 6. This response characteristic of the echo pathcorresponds directly to the pattern of the tap coefficient H. Morestrictly, if the echo path estimation circuit 103 a performs a correctlearning, a time series pattern (see the expression (7)) of thecalculated tap coefficient H is the same as the response characteristicof the impulse of the echo path.

However, depending on the types of hybrids and variation incharacteristics of individual hybrids, the response characteristic ofthe echo path is slightly different. In this embodiment, the responsecharacteristic of the echo path is classified into about 10 to 20patterns which are selected so as not to provide substantialinconveniences in actual practice. The patterns indicative of individualresponse characteristics are stored in memories M1 to Mn, respectively.

Reference numeral 20 denotes a double talk monitor. In response to asignal SEL, the double talk monitor 20 selects the memories M1 to Mn oneafter another and compares the response characteristics stored in thevarious memories with the output tap coefficient H from the adder 12.

Then, the double talk monitor 20 watches how far the tap coefficient H,which is output from the adder 12, is remote from the responsecharacteristics (reference values) stored in the memories. If they areremote from the response characteristics to the extent exceeding a rangeof allowable errors which are preliminarily established for all of thememories M1 to Mn, the double talk monitor 20 outputs a double talkdetection signal DT. In this case, information indicative of allowableerrors is also stored in the memories M1 to Mn.

B: Operation of the Embodiment

Operation of the embodiment thus constructed will now be described.Firstly, when the speech is started, the computation unit 10 calculatessuch a ΔH capable of reducing the residual echo in accordance with thelearning identification algorithm and adds the obtained ΔH to thecurrent tap coefficient H (tap coefficient in the processing register25) to prepare a tap coefficient of the next time point. The preparedtap coefficient is supplied to the processing register 25. By repeatingthe above procedure, the tap coefficient H in the processing register 25is adaptively controlled. The tap coefficient H, which has been suppliedto the processing register 25, is gradually transmitted to the storageregisters 26-1 to 26-n. Therefore, the tap coefficient H stored in thestorage register 26-n is a tap coefficient which is intended beforeseveral samplings. Since the tap coefficient H stored in the storageregister 26-n is transmitted to the H-register 3 b, the echo replicagenerator 3 c generates an echo replica based on the tap coefficientwhich is intended before several samplings.

If no double talk is generated at that time, the tap coefficient H,which is output from the adder 12, is coincident with the responsecharacteristic stored in one of the memories M1 to Mn or is convergedinto the range of the allowable errors. Therefore, the double talkmonitor 20 does not output the double talk detection signal DT.

On the other hand, when the double talk occurs while the speech is goingon, the computation unit 10 is brought into a mis-learning state becausethe speech signal of the near-end talker is superimposed on the adder 5.As a consequence, the calculated value of ΔH does not correspond to theimpulse response of the echo path. Consequently, the value of the tapcoefficient H, which is output from the adder 12, is deviated from theresponse characteristics stored in the memories M1 to Mn. When thedeviation exceeds the allowable limit of errors with respect to all ofthe memories M1 to Mn, the double talk monitor 20 outputs the doubletalk detection signal DT.

At the time point the double talk detection signal DT is output, the tapcoefficient H, which is stored in the storage register 26-n intendedbefore several samplings, i.e., the correct coefficient H before thedouble talk state is created, is transmitted to the H-register 3 b.Consequently, the echo replica generator 3 c does not generate anunnecessary echo replica and generates an echo replica based on the tapcoefficient H which is calculated immediately before the double talkstate is created. When the double talk detection signal DT is supplied,the storage registers 26-1 to 26-n do not accept any new input but keepholding the internal data. Therefore, the tap coefficient H supplied tothe H-register 3 b holds the value immediately before the double talkstate is created.

On the other hand, the tap coefficient written in the processingregister 25 is successively supplied to the adder 12 in order to add ΔHthereto. In this case, since the double talk state is present, the valueof ΔH becomes a wrong value, the tap coefficient H remains as “adeviated value” with respect to the reference value (responsecharacteristic in each memory).

When the double talk state is finished and only the speech signal of thefar-end talker is present, ΔH calculated by the computation unit 10gradually becomes closer to the reference value and is finally convergedinto the range of the allowable value, because it gradually becomes acorrect value. As a consequence, the double talk monitor 20 stops thetransmission of the double talk detection signal DT.

When transmission of the double talk detection signal DT is stopped, thestorage registers 26-1 to 26-n resume the shift of the tap coefficientH. Note that the storage register 26-n is closed at its input end untilthe content of the processing register 25 at the time point thetransmission of the double talk detection signal DT is stopped, issupplied. As a consequence, the content of the H-register 3 b is notrenewed until the arrival of a correctly renewed tap coefficient. Inthis way, a state, which existed before the double talk is detected, iscreated again and a correctly renewed tap coefficient H is supplied tothe H-register 3 b.

As apparent from the foregoing description, even in the event that thedouble talk is detected, the H-register 3 b holds a tap coefficientwhich is intended before several samplings and therefore, it is hardlysusceptible to any adverse effect caused by mis-learning. When thedouble talk state is finished, a tap coefficient reflecting the resultof learning is transmitted again to the H-register 3 b.

In this way, according to this embodiment, the impulse responsecharacteristics of the echo path are stored in the storage means and ajudgment is made as to whether or not the double talk is present, basedon the result of comparison between the output tap coefficient from theecho path estimation means and the impulse response characteristicswithin the storage means. Therefore, a reliable double talk detectioncan be performed in accordance with the characteristics of the hybrids.

Further, the tap coefficient, which is output from the echo pathestimation means, is stored in the tap coefficient delayed storage meansafter the delay of a plurality of stages. With this feature, an echoreplica can be generated by a tap coefficient before the double talkoccurs.

(Fourth Embodiment)

FIG. 8 is a block diagram showing a schematic construction of the fourthembodiment embodying a double talk detecting method according to thepresent invention. This apparatus is located within the control unit 4.In this illustration, reference numerals 41 and 42 denote groups ofband-pass filters constituted of a plurality of band-pass filters (BPF),respectively. The band-pass filters group 41 divide the signal of thefar-end speech output Rout into a plurality of frequency bands(hereinafter referred to as the “sub-bands”) and output the same. Theband-width of each sub-band is selected to be an optimum width throughvarious experiments/actual measurements. Similarly, the band-passfilters group 42 divide the signal of the near-end speech input Sin intoa plurality of sub-bands and output the same. The sub-bands in theband-pass filters group 41 correspond to the sub-bands in the band-passfilters group 42, respectively.

Reference numerals 43 ₁ to 43 _(n) denote comparators, respectively. Thenumber n (n represents an integer of 2 or more) of the comparatorscorresponds to the number of the sub-bands. That is, each comparator 43compares the far-end speech output Rout with the near-end speech inputSin in each sub-band. In the case where the far-end speech output Routand the near-end speech input Sin are different in power here, eachcomparator 43 outputs a signal indicating to that effect.

Output signals from the comparators 43 ₁ to 43 _(n) are input into an ORcircuit 44. When a signal is output from at least one of the comparators43 ₁ to 43 _(n), the OR circuit 44 generates a double talk detectionsignal DT indicative of a generation of the double talk and outputs thesame to the echo path estimation circuit/echo replica generator 3. Bydoing this, the learning function of the echo path is turned off.

FIG. 9 shows one example of a construction of an internal circuit ofeach comparator 43. Here, reference numeral 45 denotes a power ratiocalculator. From the far-end speech output Rout and the near-end speechinput Sin in each sub-band, the power ratio calculator 45 calculates apower ratio of the two. The power ratio calculated here is compared witha predetermined threshold TH in the comparator unit 46, and a signal isoutput when the power ratio exceeds the threshold.

Principles of operation of the present invention will now be described.In the present invention, the signal of Rout and the signal of Sin aredivided into sub-bands, respectively. Then, the signal on the Rout sideand the signal on the Sin side are compared with each other in eachsub-band, and it is judged that the double talk has occurred when bothof the signals are different or when the power ratio of both of thesignals exceeds a predetermined threshold. Heretofore, since a judgmentis made as to whether or not the double talk is present, merely based onthe power ratio and without dividing each signal into sub-bands, theaccuracy of double talk detection is subjected to the effect of thespeech levels of the far-end talker and near-end talker. However, thepresent invention can solve such a problem by dividing each signal intoa plurality of sub-bands and comparing them.

Usually, the speech of the far-end talker has a different frequencycharacteristic from that of the near-end talker. The reason is that thecontents of speeches of the far-end talker and near-end talker aredifferent. Even if the contents of speeches of the far-end talker andnear-end talker are the same, their frequency characteristics arenecessarily different because vibration waveforms of their vocal codesare different for each person. However, the double talk is heretoforedetected merely based on the power ratio of the signals and disregardingthe difference in frequency characteristic. The present invention paysattention to this difference in frequency characteristic and divides thewaveform of the speech of each person into sub-bands. As describedabove, since the frequency characteristic of the speech of each personis different, even if the speech transmitting power of the far-endtalker is larger than that of the near-end talker and both of them areuttering the same sound, for example, their power ratio becomesnoticeable in at least one of the sub-bands. Therefore, if this isdetected, a generation of double talk can be correctly detected.

(Modified Embodiment)

The present invention is not limited to the above-mentioned embodiments.For example, various modifications can be made as hereinafter described.

(1) In the above-mentioned first and second embodiments, speech pitchesare used as the analyzed parameters. However, the subject matter of thepresent invention is not limited to that and other parameters can beapplied. For example, formant, band width, etc. can be employed as theanalyzed parameters. With respect to a method for analyzing such atalker's speech, a variety of methods are already proposed and known andtherefore, detailed description thereof is omitted in thisspecification. An important thing is that the analyzed parameters canextract the features of the talker's voice.

(2) In the above-mentioned first to fourth embodiments, the presentinvention is applied to a signal transmission between a mobile telephoneand an terrestrial telephone. However, the application of the presentinvention is not limited to this. The present invention can likewise beapplied to all communication networks of the type in which signals aretransmitted between a two-wire circuit and a four-wire circuit.

(3) In the above-mentioned second embodiment, the double talk isdetected by comparing the correlation and the threshold. In thealternative, the double talk may be detected based on changes in stateof correlation or the like, instead of comparing the correlation withthe threshold.

(4) In the third embodiment, the allowable errors are stored in thememories M1 to Mn. In the alternative, the allowable errors may bestored in the double talk monitor 20.

(5) Further, in the third embodiment, the double talk is detected bymeans of comparison with the impulse response characteristics in thememory. However, other detecting methods may be used. For example, itcan be judged that the double talk has occurred when the value of ΔH isgreatly deviated from a usually expectable range or when some signalsare generated during the delayed time.

The latter method will be described with reference to FIG. 7. Presumethat the impulse response, which was detected after the start of speech,has a delayed time DT1 as shown in FIG. 7(a). If the echo pathestimation circuit 103 a is operated normally, no signals are supposedto be generated during this time zone. However, if a tap coefficient Hhaving errors is calculated because of a wrong detection of the doubletalk, an unnecessary echo is generated in the echo replica generator 3c. As a consequence, a signal is detected within the delayed time DT, asshown in FIG. 7(b). Therefore, an arrangement can be made such that thedelayed time is preliminarily measured and stored, so that it can bejudged that the double talk has occurred when some signals are detectedduring the time. Other methods may be employed in accordance withnecessity. What is important here is an arrangement in which it isjudged that the double talk has occurred when a usually unexpected thingis appeared.

(6) As apparent from the description of the third embodiment, accordingto the present invention, a greatly deviated value is not transmitted tothe H-register 3 b. For this reason, a request for stability withrespect to mis-learning of the computation unit 10 is decreased and thelearning speed of an echo path can be set faster to that extent. Thatis, by increasing an adaptation constant of the tap coefficient H bysetting a of the expression (4) or g of the expression (11) large, thefollow-up speed of the tap coefficient H can be increased. If thelearning speed is set sufficiently high and the accuracy of referencevalues by the memories M1 to Mn is increased, the double talk can bedetected even faster. Further, a reduced number, if necessary, of thestorage registers 26-1 to 26-n will not cause a greatly deviated valueto be transmitted to the H-register 3 b.

Furthermore, if the learning speed is sufficiently high and noinconvenience is particularly involved in view of circumstance of theuse in the state where the accuracy of reference values is high, anarrangement can be made in which the storage registers 26-1 to 26-n arenot employed and the tap coefficient H immediately after the detectionof the double talk is held in the H-register 3 b, so that the same canbe continuously used during the time the double talk is under detection.

(7) In the third embodiment, a plurality of memories for storing impulseresponse characteristics are employed. However, in the case where thehybrids to be used are somewhat specified and their characteristics aresubstantially the same, employment of only one memory is good enough.

(8) In the fourth embodiment, groups of band-pass filters are employedas means for dividing a speech signal into a plurality of sub-bands andoutputting the same. However, the present invention is not limited tothis. Other means, such as other kinds of filters and calculators, maybe employed.

(9) The present invention is not limited to the technique for dividing aspeech signal into a plurality of sub-bands. Various kinds of techniquesfor comparing speech signals in a frequency region may be employed. Forexample, if a comparison between the frequencies of basic components ofspeech signals on the side of a far-end talker and on the side of anear-end talker reveals that a difference between the two frequencies isequal to or smaller than a predetermined value, it may be judged thatthe double talk has occurred. Furthermore, as the frequency analyzer, anFFT (fast Fourier transform), or the like may likewise be employed.

What is claimed is:
 1. A double talk detecting method applicable to anecho canceler which is employed in a telephone line network forperforming a transmission between a speech coming through a four-wirecircuit and another speech coming through a two-wire circuit,comprising: a first step of extracting a first speech pitch from a firstspeech signal corresponding to a speech on said four-wire circuit side,said first speech pitch being specified by a waveform of said firstspeech signal; a second step of extracting a second speech pitch from asecond speech signal corresponding to a speech on said two-wire circuitside, said second speech pitch being specified by a waveform of saidsecond speech signal; a third step of comparing said first speech pitchwith said second speech pitch and judging whether or not a double talkis present, based on a result of the comparison; wherein said third stepfurther comprises the steps of calculating correlation values offluctuations of the speech pitches of said first and second speechsignals; and judging whether or not a double talk is present, based onsaid correlation values of fluctuations of the speech pitches of saidfirst and second speech signals.
 2. A double talk detecting apparatusapplicable to an echo canceler which is employed in a telephone linenetwork for performing a transmission between a speech transmittedcoming through a four-wire circuit and another speech coming through atwo-wire circuit, comprising: first feature extraction means forextracting a first feature from a first speech signal corresponding to aspeech on said four-wire circuit side, said first feature beingspecified by a waveform of said first speech signal; second featureextraction means for extracting a second feature from a second speechsignal corresponding to a speech on said two-wire circuit side, saidsecond feature being specified by a waveform of said second speechsignal; double talk judgment means for comparing said first feature withsaid second feature and judging whether or not a double talk is present,based on a result of the comparison; wherein said first feature is aresult of sampling of said first speech signal, said second feature is aresult of sampling of said second speech signal, and said double talkdetection means comprises: echo path estimation means for estimating anecho path by a learning identification algorithm; and storage means forpreliminarily storing a state corresponding the result of estimationwhen the double talk does not present, content stored in said storagemeans being compared with the result of estimation made by said echopath estimation means and a fact as to whether or not a double talk ispresent is detected based on the comparison.
 3. A double talk detectingapparatus according to claim 2, wherein said storage means stores adelay time as a state corresponding to the result of estimation when thedouble talk does not present; and said double talk judgment meanscompares said delay time with the result of estimation and judges that adouble talk is present when a whatever signal is detected during thedelay time.
 4. An echo canceler to be employed in a telephone linenetwork for performing a transmission between a speech coming through afour-wire circuit and another speech coming through a two-wire circuit,comprising: echo path estimation means for estimating an echo paththrough a learning identification algorithm and outputting a tapcoefficient corresponding to the result of estimation; an echo replicagenerator for generating an echo replica through a convolutionaloperation performed based on said tap coefficient; storage means forstoring an impulse response characteristic of an echo path; judgmentmeans for comparing the tap coefficient output by said echo pathestimation means with an impulse response characteristic in said storagemeans and judging that a double talk is present when the result ofcomparison exceeds a predetermined allowable value; and control meansfor stopping the output of the tap coefficient from said echo pathestimation means when said judgment means judged that a double talk ispresent.
 5. An echo canceler according to claim 4, wherein said echopath estimation means comprises tap coefficient delayed storage means; atap coefficient stored in said tap coefficient delayed storage means,instead of said echo path estimation means, being supplied to said echoreplica generation means when said judgment means judged that a doubletalk is present.
 6. A double talk detecting method according to claim 1,wherein said first speech pitch is a frequency component of said firstspeech signal, and said second speech pitch is a frequency component ofsaid second speech signal.
 7. A double talk detecting method applicableto an echo canceler which is employed in a telephone line network forperforming a transmission between a speech coming through a four-wirecircuit and another speech coming through a two-wire circuit,comprising: a first step of extracting a first speech pitch from a firstspeech signal corresponding to a speech on said four-wire circuit side,said first speech pitch being specified by a waveform of said firstspeech signal; a second step of extracting a second speech pitch from asecond speech signal corresponding to a speech on said two-wire circuitside, said second speech pitch being specified by a waveform of saidsecond speech signal; a third step of comparing said first speech pitchwith said second speech pitch and judging whether or not a double talkis present, based on a result of the comparison; wherein said third stepfurther comprises the steps of calculating correlation values offluctuations of the speech pitches of said first and second speechsignals; comparing said correlation values with a predeterminedthreshold; and judging whether or not a double talk is present, based ona result of the comparison.
 8. A double talk detecting apparatusapplicable to an echo canceler which is employed in a telephone linenetwork for performing a transmission between a speech transmittedcoming through a four-wire circuit and another speech coming through atwo-wire circuit, comprising: first feature extraction means forextracting a first feature from a first speech signal corresponding to aspeech on said four-wire circuit side, said first feature beingspecified by a waveform of said first speech signal; second featureextraction means for extracting a second feature from a second speechsignal corresponding to a speech on said two-wire circuit side, saidsecond feature being specified by a waveform of said second speechsignal; double talk judgment means for comparing said first feature withsaid second feature and judging whether or not a double talk is present,based on a result of the comparison; wherein said first feature is aresult of sampling of said first speech signal, said second feature is aresult of sampling of said second speech signal, and said double talkdetection means comprises: frequency band division means for dividingsaid first and second speech signals into a plurality of frequencybands, respectively and outputting the same; comparator means forcomparing said first and second speech signals in each of said frequencybands and obtaining power rations therebetween; and double talkdetection means for judging whether or not a double talk is presentbased on the power ration in each of said frequency bands.
 9. A doubletalk detecting apparatus according to claim 8, wherein said frequencyband division means is constituted of a plurality of band-pass filters.10. A double talk detecting apparatus according to claim 8, wherein saiddouble talk detection means judges that a double talk has occurred whenthe power ratio in at least one frequency band exceeds a predeterminedthreshold.